dgit.raspbian.org Git - babl.git/commit

author	Debarshi Ray <debarshir@gnome.org>
	Sat, 28 Apr 2018 23:15:46 +0000 (01:15 +0200)
committer	Debarshi Ray <debarshir@gnome.org>
	Mon, 14 May 2018 12:52:32 +0000 (14:52 +0200)
commit	b65deef160efa97af06ce4282f72b169a441a6a9
tree	8761c2eaa77855ad6cf1bcbe6ccf780c2e430934	tree \| snapshot
parent	a12da5f6e948f86a07959df51fcd77bde2973a25	commit \| diff

CIE: Add an SSE2 version of "RGBA float" to "CIE Lab alpha float"

On an Intel i7 Haswell, it now takes 0.13s to convert a 15 megapixel
buffer from "RGBA float" to "CIE Lab alpha float" instead of the
earlier 0.27s.

SSEx doesn't have integer multiplication or division operations, and
using bit shifts to implement integer divisions by powers of 2 seems to
introduce errors. Therefore, it was problematic to use the cube root
approximation from Hacker's Delight, which uses quite a few integer
divisions to make the initial guess. Instead, Halley's method of
approximating the cube root seems more SSEx friendly because the
initial guess requires only one integer division, which we can manage
by jumping through a relatively small number of hoops.

The scalar version of Halley's method seems to have originated from
http://metamerist.com/cbrt/cbrt.htm but that's not accessible anymore.
At present there's a copy in CubeRoot.cpp in the Skia sources that's
licensed under a BSD-style license. There's some discussion on the
implementation at http://www.voidcn.com/article/p-gpwztojr-wt.html.

Note that Darktable also has an SSE2 version of the same algorithm,
but uses only a single iteration of Halley's method, which is too
coarse.

Here's some more discussion on the cube root approximation algorithms:
https://bugzilla.gnome.org/show_bug.cgi?id=791837

https://bugzilla.gnome.org/show_bug.cgi?id=795686

extensions/CIE.c		diff \| blob \| history
extensions/Makefile.am		diff \| blob \| history